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Collect profile 
trace-packets? 
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I-TLB 
property bi 


< 

CO 




Protected 


Interpretation 


Instructions sei 


Probe for 
translated co 


I/O memor] 
reference 
exceptions 


00 


Tap 


Tap 


no 


Native code observing native 
RISCy calling conventions 


Native 
decoder 


No 


No 


Fault 
if SEG.tio 


01 


Tap 


x86 


no 


Native code observing x86 
calling conventions 


Native 
decoder 


No 


No 


Fault 
if SEG.tio 


10 


x86 


x86 


no 


x86 code, unprotected - 
TAX! profile collection only 


x86HW 
converter 


If enabled 


No 


Trap 
if profiling 


11 


x86 


x86 


yes 


x86 code, protected - 
TAX! code may be available 


x86HW 
converter 


If enabled 


Based on I* 
TLB probe 
attributes 


Trap 
if profiling 
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Fig 2a Significance of the I-TLB property bits 



Transition ( source => dest ) 
ISA & CC property values 


Handler Action 


00 => 00 


No transition exception 


00=>01 


VECT_xxx_X86_CC exception - handler converts from native to x86 conventions 


00 => lx 


VECT_xxx_X86_CC exception - handler converts from native to x86 conventions, 

sets up expected emulator and profiling state 


01 => 00 


VECT_xxx_TAP_CC exception - handler converts from x86 to native conventions 


01 =>01 


No transition exception 


01 => lx 


VECT_X86_ISA exception [conditional based on PCW.X86_ISA_ENABLE flag] 
- sets up expected emulator and profiling state 


lx=> 00 


VECT_xxx_TAP_CC exception - handler converts from x86 to native conventions 


lx => 01 


VECT_TAPJS A exception [conditional based PCW.TAP_ISA_ENABLE flag] 
- no convention conversion necessary 


lx=> 10 


No transition exception - [profile complete possible, probe possible] 


lx=>ll 


No transition exception - [profile complete possible, probe NOT possible] 



ISA & CC transition exception flow 



name 


description 


type 


VECT_calLX86_CC 


push args, return address, set up x86 state 


fault on target instruction 


VECTJump_X86_CC 


set up x86 state 


fault on target instruction 


. VECT_ret_no_fp_X86_CC 


return value to eax:edx, set up x86 state 


fault on target instruction 


VECT_ret_fp_X86_CC 


return value to x86 fp stack, set up x86 state 


fault on target instruction 


VECT.call.TAP.CC 


x86 stack args, return address to registers 


fault on target instruction 


VECTJump_TAP_CC 


x86 stack args to registers 


fault on target instruction 


VECT_ret_no_fp_TAP_CC 


return value to RV0 


fault on target instruction 


VECT_ret_any_TAP_CC 


return type unknown, setup RV0 and RVDP 


fault on target instruction 
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CC transition exceptions 



30Z. 
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Tapeshy OS 3\g 



dese. W or % \\ * re-+ur*l 



Flat 32-bit "Near" Address Space 



x86? RISC? 



call 



Transparency: 

. x86 code adheres to traditional 

x86 stack-based conventions 
. RISC uses higher performance 

register-based conventions 
. Caller has no knowledge 

of callee's ISA 
. Callee has no knowledge 

of ISA to which it will return 




Flat 32-bit "Near" Address Space 




406036J 



Flat 32-bit "Near" Address Space 



x86 



ret? 



392 



RISC 1 



r— RISC — i 




RISC -> x86 transition: 
map RISC call to x86 

3^o u-^. 3'0 



x86-> RISC transition: 
map RISC return to x86 



no ISA transition: 
no mapping required 



3d 



1 



Flat 32-bit "Near" Address Space 



RISC 



call?" 



x86 




x86-> RISC transition: 
map RISC return to x86 



RISC -» x86 transition: 
map RISC call to x86 



no ISA transition: 
no mapping required 



FT* . 3e. 



406036.1 



M 



Flat 32-bit "Near" Address Space 



x86 



call?' 



RISC 



i — RISC 




Co 



RISC -> x86 transition: 
map x86 return to RISC 

5 \ 34g_ 



x86-» RISC transition: 
map x86 call to RISC 

322 (F 3W) 



no ISA transition: 
no mapping required 
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/ 



\ 



3V9 



x86 Preamble: 
(need not be inline) 

- Load register args 



Fill-in RXA (return transfer argument area) 



General__Entry: 



YES 



XD == 0? 



Native Entry: - 

Native~Preamhle: 

(typically vacuous) 



NO 



Varargs 

AP for a very big argument list 



omit if 



Function Body: 



3\7 



3\S 



i 



set upXD: 

XD <- <descriptor__constant> 

RET 



.j 



rr 3 . 33. 



X86-to-Tapestry transition exception handler f 

II This handler is entered under the following conditions: 
// 1. An x86 caller invokes a native function 
// 2. An x86 function returns to a native caller 

// 3. x86 software returns to or resumes an interrupted native function following 
// an external asynchronous interrupt, a processor exception, or a context switch 

dispatch on the two least-significant bits of the destination address { 
case "00" // calling a native subprogram 

// copy linkage and stack frame information and call parameters from the memory 

// stack to the analogous Tapestry registers 

LR +- [SP-H-] // set up linkage register ~ 32/5 [ 
AP <- SP // address of first argument 3zA 

SP <- SP - 8 // allocate return transfer argument area 32.^ 

SP <r- SP & (-32) // round the stack pointer down to a 0 mode 32 boundary — 32.7 

XD <- 0 // inform callee that caller uses X86 calling conventions -~ 328 

case "01" // resuming an X86 thread suspended during execution of a native routine 

r3 if the redundant copies of the save slot number in EAX and EDX do not match or if "2 3-7 j 

% the redundant copies of the timestamp in EBX:ECX and ESLEDI do not match { J 

„i // some form of bug or thread corruption has been detected 

i'ti goto TAPESTRY_CRASH_S YSTEM( thread-coiruption-error-code ) ^ 2>7 

° } 1 _ 

save the EBX:ECX timestamp in a 64-bit exception handler temporary register 7 ^ ^ y 37^ 

(this will not be overwritten during restoration of the full native context) 3 

use save slot number in EAX to locate actual save slot storage — "57 *f 

restore full entire native context (includes new values for all x86 registers) -~375 

if save slot's timestamp does not match the saved timestamp { 37 C» 

// save slot as been reallocated; save slot exhaustion has been detected 

goto TAPESTRY_CRASH_SYSTEM( save-slot-overwritten-error-code ) ~ 3"? 7 



M 



} j 

free the save slot "378 -/ 

case "10" // returning from X86 callee to native caller, result already in registers ~~) 

RV0<63 :32> <- edx<3 1 :00> // in case result is 64 bits -~ 33 3 I "bVL. 

convert the FP top-of-stack value from 80 bit X86 form to 64-bit form in RVDP **3i > ] 
SP<-ESI // restore SP from time of call -~3"2>7 J 

case "11" // returning from X86 callee to native caller, load large result from memory "7 

RV0. RV3 <- load 32 bytes from [ESI-32] // (guaranteed naturally aligned) 33© I Z23 
SP<-ESI // restore SP from time of call —337 J 

} 

EPC <- EPC & -4 // reset the two low-order bits to zero ~* 1>bk> 
RFE 338 
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Tapestry-to-X86 transition exception handler 

// This handler is entered under the following conditions: 
// 1 . a native caller invokes an x86 function 
// 2. a native function returns to an x86 caller 
switch on XD<3:0> { 3fL 

XDRETFP: // result type is floating point 

F0/F1 <- FINFLATE.de( RVDP ) // X86 FP results are 80 bits 
SP 4- from RXA save // discard RXA, pad, args 

FPCW <- image after FINIT & push // FP stack has 1 entry 
goto EXIT 



XD_RET_WRITEBACK: // store result to @RVA, leave RVA in eax 

RVA <- from RXA save // address of result area 

copy decode(XD<8:4>) bytes from RV0..RV3 to [RVA] 
eax <- RVA // X86 expects RVA in eax 

SP <- from RXA save // discard RXA, pad, args 

FPCW <- image after FINIT // FP stack is empty 

goto EXIT 

XD_RET_SCALAR: // result in eax:eda 

edx<3 1 :00> <- eax<63 :32> // in case result is 64 bits 

SP. <- from RXA save // discard RXA pad, args 

FPCW <- image after FINIT // FP stack is empty 

goto exit y 



34^ 



XD_C ALL_HIDDEN_TEMP : 
esi <- SP 
SP 4-SP-32 
RVA <- SP 
LR<1:0><-"11" 
goto CALLCOMMON 



// allocate 32 byte aligned hidden temp 
// stack cut back on return " 
II allocate max size temp 
// RVA consumed later by RR 
// flag address for return & reload 



^ 3«f4 



// remaining XD_CALL_xxx encodings 
// stack cut back on return *>- 
// flag address for return *>- 3*^ 



default: 

esi <r- SP 
LR<1:0><-"10" 
CALL_COMMON: 

interpret XD to push and/or reposition args "2>47 
[~SP] <- LR // push LR as return address 

EXIT: 

setup emulator context and profiling ring buffer pointer 

} 

RFE // to original target 

} 



3tS 



Fig. 3i 



406036 1 



interrupt/exception handler of Tapestry operating system: ^ 

// Control vectors here when a synchronous exception or asynchronous interrupt is to be 
// exported to / manifested in an x86 machine. 

// The interrupt is directed to something within the virtual X86, and thus there is a possibility 
// that the X86 operating system will context switch. So we need to distinguish two cases: 
// either the running process has only X86 state that is relevant to save, or 
// there is extended state that must be saved and associated with the current machine context 
// (e.g., extended state in a Tapestry library call in behalf of a process managed by X86 OS) 
if execution was interrupted in the converter - EPC.ISA = X86 { ~> 

// no dependence on extended/native state possible hence no need to save any r 3Sl- 
goto EM86_Deliver_Interrupt( interrupt-byte ) J 
} else if EPC.Taxi_Active { 

// A Taxi translated version of some X86 code was running. Taxi will rollback to an 
// x86 instruction boundary. Then, if the rollback was induced by an asynchronous external 
// interrupt Taxi will deliver the appropriate x86 interrupt. Else, the rollback was induced 
// by a synchronous event so Taxi will resume execution in the converter, retriggering the 
// exception but this time will EPC.ISA = X86 
m goto TAXi_Rollback( asynchronous-flag, interrupt-byte ) 

ftlse if EPC.EM86 { ~ 



i 



3S3 



$3else { 



3H 



// The emulator has been interrupted. In theory the emulator is coded to allow for such ^r.^S 
// conditions and permits re-entry during long running routines (e.g. far call through a gate) 
// to deliver external interrupts 
goto EM86_Deliver_Interrupt( interrupt-byte ) 



// This is the most difficult case - the machine was executing native Tapestry code on 
// behalf of an X86 thread. The X86 operating system may context switch. We must save 
// all native state and be able to locate it again when the x86 thread is resumed. 
r 3€>\ 

allocate a free save slot; if unavailable free the save slot with oldest timestamp and try again 

save the entire native state (both the X86 and the extended state) "7 3^-2. 

save the X86 EIP in the save slot 3 

overwrite the two low-order bits of EPC with "01" (will become X86 interrupt EIP) ^- 3^3 

store the 64-bit timestamp in the save slot, in the X86 EBX:ECX register pair (and, ? 

for further security, store a redundant copy in the X86 ESI:EDI register pair) o 
store the a number of the allocated save slot in the X86 EAX register (and, again for 1 

further security, store a redundant copy in the X86 EDX register) \ 
goto EM86_Deliver_Interrupt( interrupt-byte ) *-\— 3^ 
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TAXi code prolog generation by TAXi translator 



For each native X86 segment descriptor: 




\ 



it 



one- 



eat 



If this descriptor is marked to indicate that a cloned copy is required 
(reflection both optimized and unoptimized references through this segment 
t descr iptor) then* 



else.- 



Emit code to copy one of the X86 segment descriptors to one of the 
segment descriptor registers reserved for TAXi code. The TAXi 
optimized load bit 810 of the segment descriptor is guaranteed to match 
TAXi Control. tio 820 



7 



r 



86© 



Emit code to explicitly set the value of the cloned descriptor's TAXi 
optimized load 810 to the opposite value. 



Emit code to implement the translated hot spot of the X86 code 



\ 



\ 



typedef struct { 
save_slot_t * 
save_sIot_t * 
unsigned int64 
unsigned int64 
unsigned int64 

timestamp_t 
int 

boolean 
} save_slot_t; 



// pointer to next-most-recently-allocated save slot 7 -^7^ 
// pointer to next-older save slot _j 
// saved exception PC/IP ' 
// saved exception PCW (program control word) L 35 
II save the 63 writeable general registers 
// other words of Tapestry context ^ _ 

// timestamp to detect buffer overrun iSS 
// ID number of the save slot ~ 3^7 



newer; 
older; 
epc; 
pew; 

registers[63]; 

timestamp; 
save_slot_ID; 
save_slot_is_full; // full / empty flag 359 



save_slot_t * 
save slot t * 



save_sIot_head; 
save slot tail; 



// pointer to the head of the queue 
// pointer to the tail of the queue 



- 379 a 



^system initialization 

to reserve several pages of unpaged memory for save slots 

hi 



Fig. 3k 




Prepare x86 excep. or int. -| 

Alloc free or oldest save slot 
Store timestamp & full state 
x86 regs <- save slot ID, TS 
EPC<1:0><- 01 
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— Handler: RISC to x86 — 

XD contains return-descriptor: 



. Interpret XD: ^ 

- Reformat / repostion result 

- Load FPCW 

. SP <-_[SP] // pop RA & args^ 
XD contains call-descriptor: 



I I % 



I 

/ 



ESI <- SP 



LR<1:0><- IxperXD 
Push LR as RA (ret addr) 



2.20 

Handler: x86 to RISC — i 

EPC<1:0> = 00: 322 



. LR <- [SP] 
.SP<- SP + 4 
. AP <- SP 

. SP <- SP - 8 // ret area 
. SP <- SP & (-32) 
^XD_<-_0 

EPC<1:0> = 01: 



37Q 



x86 regs points to save slot 
Using TS verify no overwrite 
Restore full state 
Free save slot 

EPC<1:0>_<-_00 i 

329 332. 



\ jJ 

\ 
\ 

\j EPC<1:0> - 1x: 



Reformat / repostion the 



Interpret XD, reposition args ■ * - - J function result per EPC<0> 



SP <r- ESI 
EPC<1:0> 



00 



R 3 . 31 



I 
I 
i 

i 



Flat 32-bit "Near" Address Space 




RISC — , 

308 



317, 
r 3\9 

xB6 preamble 

i, 3Es 

xd<- ret-desc 



— Handler: RISC to x86 — 

XD contains return-descriptor: 



. Interpret XD 

- Reformat / repostion result 

- Load FPSW 

. SP <- [SP] // pop RA & args^ 
XD'c6'ntaihs 8 -<all^^'npton^ 



32P 



7 



Handler: x86 to RISC — 1 



EPC<1:0> = 00: 
. LR«- [SP] 
. SP *- SP + 4 
. AP «- SP 
. SP <- SP - 8 
. SP <- SP & (-32) 

^XD_<-_0_ 

EPC<1:0> = 01: • 



3-22 
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Flat 32-bit "Near" Address Space 




/ 



Flat 32-bit "Near" Address Space 



x86 




, RISC — , 



ret 



322 



xd<- call-desc 
.call 




r 
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Handler:. RISC to x86 
XD contains Veturn-descriptor: 



XD contains call-descriptor: 



. ESI <- SP 

. Interpret XD.reposition args 

. LR<1:0><- (TC>per XD 

. Push LR as jRA (ret addr) 1 



38o 
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Handler: x86 to RISC 



EPC<1:0> = 00:- .' 



EPC<1:0> =(Tx) 



. Reformat / repostion the 
function result per EPC<0> 
.SP<-ESI f*9 
. EPC<1:0>«- 00 ~ 



320 
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page frame X • 




page frame Y 



page frame Z 



;;in^;6ntry:ppilef : 



,-;jcc||taken1; 
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7 entry trace packet 



Entry 


Event Code 


Done Addr 


Next Addr 




64 bit time stamD 


1 


ret 


x86 context 


physX:f 


2 


new paqe 


physY:z 


physY:h 


3 


jcc forward 


phys Y:i 


phys Y:k 


4 


jnz backward 


phys Y: I 


phys X: a 


5 


seq; env change 


x86 context 


physX.b 


6 


ip-rel near call 


physX:c 


phys Z:d 


7 


near ret 


phys Z:e 


physXif 



F3 .4a 



Source 1 


Code 
Hi 4 


Event 


Reuse event code 


Proflleable event 


Initiate packet 


Probeable event 


Probe event bit - 
FTLB probe attribute 
or 

Emulator probe 




0.0000 


Default (x86 transparent) event, reuse all converter values 


yes 












0.0001 


Simple x86 instruction completion (reuse event code) 


yes 






reuse- e^c*^ ] 




0.0010 


Probe exception failed 


yes 








0.0011 


Probe exception failed, reload probe timer 


yes 




Wo 








0.0100 


flush ev^ 


no 


no 


no 


no 


- 


*j 


0.0101 


Sequential; execution environment changed - force- evevwr* 


no 


yes 


no 


no 




c 
'© 

t. 

« 


0.0110 


Far RET 


no 


yes 


yes 


no 




0.0111 


IRET 


no 


yes 


no 


no 






0.1000 


Far CALL 


no 


yes 


yes 


yes 


Far call 


RFE (Conti 


0.1001 


Far JMP 


no 


yes 


yes 


no 




0.1010 


Special; emulator execution, supply extra instruction data a 


no 


yes 


no 


no 


- 


0.1011 


Abort profile collection 


no 


no 


no 


no 






0.1100 


x86 synchronous/asynchronous interrupt w/probe (GRP 0) 


no 


yes 


yes 


yes 


Emulator probe 




0.1101 


x86 synchronous/asynchronous interrupt (GRP 0) 


no 


yes 


yes 


no 


- 




0.1110 


x86 synchronous/asynchronous interrupt w/probe (GRP 1) 


no 


yes 


yes 


yes 


Emulator probe 




0.1111 


x86 synchronous/asynchronous interrupt (GRP 1) 


no 


yes 


yes 


no 


- 




1.0000 


IP-relative JNZ forward (opcode: 75, OF 85) 


no 


yes 


yes 


no 


_ 




1.0001 


IP-relative JNZ backward (opcode: 75, OF 85) 


no 


yes 


yes 


yes 


Jnz 




1.0010 


IP-relative conditional jump forward - (Jcc ( Jcxz, loop) 


no 


yes 


yes 


no 






1.0011 


IP-relative conditional jump backward - (Jcc, Jcxz, loop) 


no 


yes 


yes 


yes 


Cond jump 




1.0100 


IP-relative, near JMP forward (opcode: E9, EB) 


no 


yes 


yes 


no 




c 

4) 


1.0101 


IP-relative, near JMP backward (opcode: E9, EB) 


no 


yes 


yes 


yes 


Near jump 




1.0110 


RET/ RET imml6 (opcode C3, C2 /w) 


no 


yes 


yes 


no 




1.0111 


IP-relative, near CALL (opcode: E8) 


no 


yes 


yes 


yes 


Near call 




1.1000 


REPE/REPNE CMPS/SCAS (opcode: A6, A7, AE, AF) 


no 


yes 


no 


no 






1.1001 


REP MOVS/STOS/LDOS (opcode: A4, A5 t AA, AB, AC, AD) 


no 


yes 


no 


no 




e 


1.1010 


Indirect near JMP (opcode: FF/4) 


no 


yes 


yes 


no 




a 


1.1011 


Indirect near CALL (opcode: FF 12) 


no 


yes 


yes 


yes 


Near call 




1.1100 


load from I/O memory (TLB.asi != 0) { not used in 71 J 


no 


yes 


no 


no 






1.1101 




\ v\o 


i\© 


no 


no 






1.1110 


Default converter event; sequential 42fe 


no 


no 


no 


no 






1.1111 


New page (instruction ends on last byte of a page frame or 
straddles across a page frame boundary) 


no 


yes 


no 


no 





a. Used by emulator for new x86 opcodes. Extra information supplied in Taxi_ControLspecialjopcode bits. 
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mark the conventional descriptor to indicate that it must be cloned in the prolog 



emit a load through the descriptor to be cloned by the code emitted at 866, 868, whose Taxi 
optimized load bit 810 is One •"""^ *S>*V 



J else if the load is known to be (or believed to be) to non-well-behaved memory \ ^yp. 
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emit the load through the conventional segment descriptor used by the emulator, whose Taxi 
optimized load bit 810 is Zo-n=?, 
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